Skip to content

feat(compare): add counterfactual pricing#238

Open
ozymandiashh wants to merge 1 commit into
getagentseal:mainfrom
ozymandiashh:feat/counterfactual-pricing
Open

feat(compare): add counterfactual pricing#238
ozymandiashh wants to merge 1 commit into
getagentseal:mainfrom
ozymandiashh:feat/counterfactual-pricing

Conversation

@ozymandiashh
Copy link
Copy Markdown
Contributor

Summary

This adds a pricing-only what-if mode to codeburn compare: codeburn compare --reprice <model>. The normal interactive model comparison stays unchanged, but users can now ask how their recorded usage would have priced if every call had used a different model.

The calculation keeps the original usage shape: input tokens, output tokens, cache creation, cache reads, web-search requests, and fast-mode flags are reused exactly as recorded. Only the model pricing table changes. That makes the output useful for budget planning while keeping the scope honest: it does not claim to simulate model quality, different reasoning behavior, or shorter/longer outputs.

What changed

  • add codeburn compare --reprice <model> as a non-interactive what-if pricing mode
  • add --json for scripted/reporting workflows when --reprice is used
  • validate the target model before parsing sessions, using the existing pricing and alias lookup
  • report actual spend, what-if spend, savings or added cost, and percent difference
  • surface the largest session impacts, plus project and source-model breakdowns
  • skip <synthetic> calls so provider bookkeeping entries are not repriced as real API calls
  • preserve fast-mode pricing semantics for the target model
  • document the new workflow and add an Unreleased changelog entry

Example

codeburn compare --reprice claude-sonnet-4-5
codeburn compare -p week --provider claude --reprice gpt-5.3-codex
codeburn compare --reprice gpt-4o-mini --json

Output shape

The text output is meant for quick budget decisions:

  • total actual vs what-if spend
  • net savings or added cost
  • top session impacts sorted by absolute difference
  • project-level impact
  • source-model impact

The JSON output returns the same structure in USD fields so downstream scripts do not have to parse localized currency strings.

Validation

  • npx vitest run tests/reprice.test.ts
  • npx vitest run
  • npm run build
  • node dist/cli.js compare --help
  • node dist/cli.js compare --reprice gpt-4o-mini -p today --json
  • node dist/cli.js compare --json

Note: full npx tsc --noEmit is still blocked on origin/main by existing src/providers/copilot.ts type errors; this branch does not touch that provider.

@ozymandiashh ozymandiashh marked this pull request as ready for review May 5, 2026 22:37
@AgentSeal AgentSeal added needs-testing needs-validation PR requires validation against real-world usage before review and removed needs-testing labels May 12, 2026
@ozymandiashh
Copy link
Copy Markdown
Contributor Author

ozymandiashh commented May 17, 2026

Validation run on macOS arm64 with private details omitted. This is behavior proof for the reprice path, not just build proof.

What was checked:

  • compare --reprice accepted a target model and produced machine-readable JSON.
  • The JSON contained the expected what-if summary: actual cost, repriced cost, savings/added-cost delta, and percentage difference.
  • The project, source-model, and top-session impact sections were present and non-empty on real local usage, proving the command is not only returning a header/empty shell.
  • Synthetic calls are skipped by the tested reprice logic so bookkeeping entries are not priced as real API usage.

Commands:

  • npx vitest run tests/reprice.test.ts - 8/8 tests passed.
  • npx tsx src/cli.ts compare --reprice claude-sonnet-4-6 --provider claude -p today --json - JSON parsed and structural checks passed.

No project names, prompts, paths, session IDs, raw costs, usage totals, or private product details are included here.

@ozymandiashh ozymandiashh force-pushed the feat/counterfactual-pricing branch from d719d80 to b8ce3c6 Compare May 17, 2026 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-validation PR requires validation against real-world usage before review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants